7 research outputs found
Interactive Teaching Algorithms for Inverse Reinforcement Learning
We study the problem of inverse reinforcement learning (IRL) with the added
twist that the learner is assisted by a helpful teacher. More formally, we
tackle the following algorithmic question: How could a teacher provide an
informative sequence of demonstrations to an IRL learner to speed up the
learning process? We present an interactive teaching framework where a teacher
adaptively chooses the next demonstration based on learner's current policy. In
particular, we design teaching algorithms for two concrete settings: an
omniscient setting where a teacher has full knowledge about the learner's
dynamics and a blackbox setting where the teacher has minimal knowledge. Then,
we study a sequential variant of the popular MCE-IRL learner and prove
convergence guarantees of our teaching algorithm in the omniscient setting.
Extensive experiments with a car driving simulator environment show that the
learning progress can be speeded up drastically as compared to an uninformative
teacher.Comment: IJCAI'19 paper (extended version
Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning
Text-based games are a popular testbed for language-based reinforcement
learning (RL). In previous work, deep Q-learning is commonly used as the
learning agent. Q-learning algorithms are challenging to apply to complex
real-world domains due to, for example, their instability in training.
Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the
text-based environment. To deal with sparse extrinsic rewards from the
environment, we combine it with a potential-based reward shaping technique to
provide more informative (dense) reward signals to the RL agent. We apply our
method to play difficult text-based games. The SAC method achieves higher
scores than the Q-learning methods on many games with only half the number of
training steps. This shows that it is well-suited for text-based games.
Moreover, we show that the reward shaping technique helps the agent to learn
the policy faster and achieve higher scores. In particular, we consider a
dynamically learned value function as a potential function for shaping the
learner's original sparse reward signals
Interactive Teaching Algorithms for Inverse Reinforcement Learning
We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher
Interactive Teaching Algorithms for Inverse Reinforcement Learning
We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher